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In the health sector, the computer-aided diagnosis (CAD) system is a rapidly 
growing technology because medical diagnostic systems make a huge 
change as compared to the traditional system. Now a day huge availability of 
medical data and it needs a proper system to extract them into useful 
knowledge. Heart disease accounts to be the leading cause of death 
worldwide. Heuristic algorithms have been exposed to be operative in 
supporting making decisions and classification from the large quantity of 
data produced by the healthcare sector. Classification is a prevailing 
heuristic approach which is commonly used for classification purpose some 
heuristic algorithm predicts accurate result according to the marks whereas 
some others exhibit limited accuracy. This paper is used to categorize the 
attendance of heart disease with a compact number of aspects. Original, 13 
attributes are involved in classifying heart disease. A reasonable analysis of 
these techniques was done to conclude how the cooperative techniques can 
be applied for improving prediction accuracy in heart disease. Four main 
classifiers used to construct heart disease prediction based on the 
experimental results demonstrate that support vector machine, artificial bee 
colony (ABC), bat algorithm, and memory-based learner (MBL) provide 
efficient results. The accuracy differs between 13 features and 8 features in 
the training dataset is 1.9% and in the validation, dataset is 0.92% of vector 
machine which is the most accurate heuristic algorithm. 
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1. INTRODUCTION 


Medical data design in healthcare is one of the important and complicated tasks because as a result, 
it must show accuracy and efficiency in its performance. Computer-aided diagnosis (CAD) becomes a very 
useful tool that attempts to solve real-world health problems in the diagnosis and treatment of disease [1]. In 
present years, heuristic systems have found their significant hold in every field comprising health care. 
Medical history data includes a number of tests prescribed to diagnose a specific disease [2]. Clinical 
databases are elements of the domain where the route of heuristic technique has advanced into an inevitable 
aspect due to the measured incline of medical and clinical research data. It is possible for the healthcare 
industries to gain an advantage of the heuristic technique by employing the same as an intelligent 
investigative tool. The report of heuristic technique is high accuracy and absorbing rate in this field. 
Normally, many health care organizations are encrustation a major defy to offer high-quality provisions, like 
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analyzing patients appropriately and organizing treatment at sensible costs [3], [4]. The heuristic technique 
has been widely used to mine information from medical databases. In the heuristic technique, a classification 
is a form of learning that can be used to design models describing important data classes. Using those 
heuristic techniques can provision researchers or physicians in making medical verdicts and they can answer 
significant and connected questions regarding health care. It is problematic to recognize heart disease 
because of several contributory risk issues such as diabetes, high blood pressure, high cholesterol, abnormal 
pulse rate, and many other factors [5], [6]. The rest of this paper is organized as: The necessary background 
for the heart disease is discussed in section 2. Section 3, presents the related work in this section. In section 4, 
the statistical analysis and limitations are presented. In Section 5, we present the outstanding issues of 
heuristic algorithms for heart disease prediction and their result. Section 6, presents the conclusion. For the 
comfort of readers, we provided a list of the most frequently used acronyms in the paper are mentioned in 
Table 1. 


Table 1. List abbreviations 


Abbreviations Meaning Abbreviations Meaning 
HD Heart diseases Bat Bat algorithm 
PH Predication approach ABC Artificial bee colony 
MBL Memory-Based Learner DP Deep learning 
SVM Support vector machine M.SVM Modified support vector machine 
WHO World Health Organization UCI University of California Irvine 
CAD Computer-aided diagnosis MBL Memory-based learner 


2. DISEASES 

The disease is a particular term that is used for the abnormal condition, or this term is used in 
healthcare conditions that part of an organism. Various diseases are known by their symptoms, signs, and 
knowledge and therefore it is necessary to optimize disease forecaster. According to a survey by World 
Health Organization (WHO), heart disease has been the main cause of death in the world. Heart attack in 
different countries due to exertion, work overload, mental stress, food, and so on the treatment and diagnosis 
is complicated therefore it is an important step to design a systematic system for diagnoses [7], [8]. The main 
prevention of this disease is to make an early prediction. Heart disease risk factors the main elements which 
make a personal risk factor for this disease are. Age factors: Normally men face heart attack at age of 65 and 
women face this kind of attack after. Climacteric but as compared to women, men face more heart attacks. 
Family history; normally it is seen that the family faces this kind of problem and it inherits from family to 
family heart disease cause. Smoking factor a chemical in tobacco smoke that developed blood clots and also 
increase the cause of heart attacks now a day it is more in the young generation due to the use of smoke 
weight; if the body increases then it makes the chance of heart disease. These people who carry extra body 
facts give chance to these diseases. To reduce this kind of numerous problems any kind of factor can be used 
[9], [10]. 


3. RELATED WORK 

Different researchers have proposed several heuristic algorithms and techniques to predict heart 
disease accurately some of them are. Agostino et al. [11], designed a neural network algorithm known as 
self-organizing piecewise aggregate approximate (SOPAA). This algorithm is used for electrocardiogram 
(ECG) signal classification and performed to diagnose heart disease. According to their result, 97% better 
than other techniques used. Researchers implement a genetic algorithm in a neural network for heart disease 
prediction. They use 12 parameters such as sex, age, blood, and cholesterol. The algorithms calculate the 
number of hidden layer nodes and architecture to calculate the parameter result. According to their result, it 
gives up to 98% accuracy16. According to [12], design a scheme that is known as a prototype intelligent 
heart disease prediction system (IHDPS) using deep learning technique, namely decision trees. ID3. By using 
medical profiles of patients such as age, gender, blood pressure and blood sugar, chest pain, and an 
electrocardiogram (ECG) graph, it can predict the likelihood of patients getting a heart disease or not and 
make 82% effectually working in necessity. Concurring to [13], design a multilayer perceptron model for 
analytical detection of heart disease severity based on various factors. The author also organizes a novel 
principle attribute analysis to understand the direction of the attributes emotional the results. The final effect 
of this work is to explore the heart disease severity based on the planned multilayer perceptron model 
conferring the result it 97% accurate. 

Ullah et al. [14] designed a scheme known as a decision tree and grabbing algorithm used for the 
reduction of blood and oxygen supply to the heart leads to heart disease. Used disparate attributes to associate 
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with other algorithms and the result shows a 96% increment in this scheme [15], proposed for the diagnosis 
of coronary artery disease the decision tree is used to select the most significant attributes, and the output is 
converted into crisp if-then rules. The crisp sets of rules are transformed into the fuzzy rules and these rules 
constitute the fuzzy rule base. The performance of the proposed system is analyzed using various parameters 
like classification accuracy, sensitivity, and specificity and it is observed that that this system achieves better 
accuracy about 98% than the existing systems [16]. Author applied and compared several data mining 
techniques to predict heart diseases. They used models based upon five algorithms including C5.0, neural 
network, support vector machine (SVM), k-nearest neighborhood (KNN), and logistic regression. C5.0 
decision tree was able to build a model with greatest accuracy of 93.02% whereas KNN, SVM, neural 
network had the accuracy of 88.37%, 86.05%, and 80.23% respectively [17]. Summary of the related work 
are mention in Table 2. 


Table 2. Related work summary 


Heuristic Algorithms Year Disease Dataset Simulation Accuracy rate 

Bayes Net/ FT 84.5% 2015 Coronary artery disease UCI WEKA 84.5% 
Naive Bayes/SVM 

Naive Bayes/ J48 2015 Heart Disease UCI WEKA 85.1 
Bagging/ SVM 2018 Heart Disease UCI WEKA 84.2 
UCI 2019 Heart disease UCI WEKA 84.95 
ANN 2019 Dengue Disease UCI WEKA 85.56 
MFNN 2019 Dengue Disease UCI WEKA 84.56 
C4.5 2019 Dengue Disease UCI WEKA 84.56 
Feed forward NN with Back propagation 2019 Dengue Disease UCI WEKA 84.56 
CART 2019 Dengue Disease UCI WEKA 82.56 
K Star 2019 Dengue Disease UCI WEKA 83.56 


4. PROPOSED APPROACHES 

In this section, we present all those classifiers of heuristic algorithms which are used for heart 
disease prediction in this paper. Clevel and University of California Irvine (UCI) repository dataset for heart 
disease classification process offers an easy-to-use visual representation of the dataset, working situation, and 
building the analytical analytics. Applying the machine learning technique start from pre-processing phase 
followed by post-processing steps. 


5. CLASSIFIER EVALUATION MEASURES 

In this research, the context selection technique is determined by using the weight of the lower-level 
context technique. Rather than using the learning weights through heuristic algorithms. In this section, we 
collect the information from each gain of the attribute as its weight. The basic concept of this study comes 
from the SVM classification technique and it is used the concept of information gain as the criteria for 
selecting an attribute process [18], [19]. SVM is justly used for story set selection so the technique applied 
consists of computing the SVM for each field. SVM gives attribute X with respect to the class attribute Y is 
the reduction in uncertainty about the value of Y when we know the value of X, I (Y; X). The uncertainty 
about the value of Y when we know the value of X is given by the conditional entropy of Y given X, H (Y, 
X) [20]. Figure 1 presents the proposed approaches. 


Accuracy 
Sensitivity 
Precision 


Predication 
Technique 


Figure 1. Proposed approaches structure 
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5.1. Support vector machine (SVM) 

SVM has proven to be a supervised machine learning method, commonly used to provide data 
analysis for regression and classification problems. Generally, this method is used as a classifier, based on 
the principle of making a plot of n dimensions in space, where the value of each entity also presents the value 
of the specified coordinate. Subsequently, find the ideal hyperplane that made it possible to differentiate the 
two created classes [21], [22]. Like other machines, the SVM takes input, performs input manipulation, and 
then provides an output, which can later categorize new examples. SVM classifier used for binary 
classification of feature space by hyperplane-based linear separation of data. Normally used for nonlinear 
regression and pattern classification purposes, SVM work on the following base rule [23], [24]. 


Sgn(f (x, wb) 1 
f(x,w, b) = (w.x) +b (1) 


Where ~* is the example to be classified and the maximum margin with a hyper plane (w, b) represent the 
complex problem w within the specified constraints [25], [26]. 


yi(w.xi)+b)2 1 (2) 
(w.x)+b)=0 wakR",beR (3) 


5.2. Memory-based learner (MBL) 

The memory-based learner also known as instance-based this approach works on the nearest 
neighbor that classifies the attributes based on training labels (K) as the neighbor in the feature space. A 
distance function is used to identify the nearest neighbors. The type of distance function is contingent on the 
data type of nominated attributes [27], [28]. 


d(xi, xj) = > (Xia = Xia)? + cee Le (Xic = Xjc) (4) 


qaQ 
Where Lc represents the M x M matrix used to describe the distance between two categorical attributes, Let 
Ni represents the set of k nearest neighbors for an instance xi having the distance of d, T 
[29]-[33]. 


Vi(t) = ZKEN: (t, yk)) (5) 


5.3. Bat algorithm 
Bat algorithm generates the new solution now it time to speed at which bat searches for prey in the 
dimension present as dimension. These equations are used for the classification process. 


t-1 t-1 * 
Via Ue eae) (6) 


The given equation d presents the dimension of the search space V$, and ve D denote the flight speed of the 


bat i and t — 1 respectively et 


in the population up to the t iteration so far [34], [35]. 


denote the position of bat i at time t — 1 and xj represent the best position 


5.4. ABC algorithm 

All the directions of the inhabitants of food causes, xm—’s, are initialized (m = 1...SN, SN: 
population size) by scout bees and regulator parameters are set. Since each food source, xm—, is a solution 
vector to the optimization problem, each xm— vector holds n variables, (xmi, i = 1...n), which are to be 
enhanced so as to minimize the classification purpose. 


6. DATA PRE-PROCESSING 

Dataset for this study Cleveland dataset (UCI, 2016) used which are under the control of UCI 
machine learning repository used for heart disease dataset. The dataset consists of 303 instances of the patient 
but 6 of them are conation missing values and Table 3 shows the dataset attributes with their definition [36], 
[37]. 
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Table 3. Dataset features information and description 


The domain of 


Feature name Feature code Description - 
values (minimax) 

Age AGE Age in yrea 20<age>80 
Sex SEX Female =1, Male 0, 0,1 
Resting blood pressure RBP mm Hg admitted at the hospital 94-200 
Type of chest pain CPT 1. atypical angina 1 

2. typical angina 2 

3. asymptomatic 3 

4. nonagonal pain 4 
Fasting blood sugar >120 mg/dl FBS Fasting blood sugar >120 mg/dl (1 _ true; 0 _ false) 10 

00 
12 

Serum cholesterol SCH In mg/dl 120-564 
Resting electrocardiographic results RES having ST-T 1 1 

hypertrophy 2 2 
Maximum heart rate achieved MHR - 71-202 
Exercise-induced angina EIA 1_ yes 1 

0_no 0 
Old peak _ ST depression induced by OPK - 0-6.2 
exercise relative to rest 
Slope of the peak exercise ST PES 1. up sloping 1 1 
segment 2. .at2 2 

3. down sloping 3 
Number of major vessels (0-3) VCA - 12 
colored by horoscope 3 
Thallium scan THA 3.normal 3 

6.£xed defect 6 

7. reversible defect 7 


7. RESULTS AND DISCUSSION 

In this section, we present the result of different proposed classifiers in heuristic algorithms taking 
UCI dataset. Before discussing the result, these results are evaluated by using evaluation measures which 
areas: 

In this study we evaluated the accuracy rate of prediction of proposed different models on heart 
disease datasets are mentioned: i) Classification error: it is used for the measurement of incorrect 
classification in the classification model which is measured as; ii) Classification accuracy: used to check the 


overall result of the performance of different classification and it will be measured as; iii) Accuracy = 
(TP + TN) 


(TP + FP + TN + FN) 
that diagnostic test is positive and the test result for which process has find and it can be written as; v) 


* 100 [38], [39]; iv) Sensitivity is the process which shows the positive fraction or confirms 


BENE ses 5 Aas TP ; Bp sates ig i ; 
Sensitivity/rec call /true positive rate =Sensitivity = TPN * 100; vi) Specificity is diagnostic test is 
: : TN a rae TP 
negative and person is healthy and can be present as: ———— * 100; vii) Precision can be: ———_ * 100; 
(TN + FP) (TP + FN) 


viii) True Positive rate= TP / (TP + FN); ix) False Positive rate = FP / (FP + TN) [40]-[42]. 


8. EXPERIMENTS 

In this study, Jupiter notebook software was used for heart disease prediction taking, cleverly and 
heart disease dataset. Here, the diagnostic performance is evaluated in terms of accuracy, precision, 
sensitivity, and specificity, taking different heuristic algorithms. The factors contributing to these are 
discussed, see Table 4 and Table 5. 

Figure 2 shows the result of different machine learning approaches in terms of accuracy 
performance for heart disease prediction. Based on the result SVM classifier provide the best result and its 
accuracy rate is 98.91% best case and worst case. Memory-based learner and its result was 93.62%. Table 6 
presents the result of the proposed technique (Classification) results in terms of instances. 


Table 4. Assessment of heuristic approach for heart disease prediction 


Parameter Accuracy Sensitivity Specificity TP Rate FP Rate 
Memory-Based Learner 81.08% 86.25% 75.82% 0.8625 0.2410 
Bat 84.08% 86.25% 75.82% 0.8625 0.2410 
ABC 79.05% 83.12% 74.26% 0.8312 0.2573 
SVM 98.90% 97.45% 90.20% 0.92897 0.2175 
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Table 5. Result of different classifier 


Classifiers Performance 
Memory-Based Learner 90.74% 
Bat 93.62% 
ABC 98.089% 
SVM 98.91% 

Performance 


Memory-Based 
Learner 
24% 


Figure 2. Rust of different classifier 


Table 6. Proposed classifier rest 
TP Rate FP Rate Precision Recall F-Measure ROC Area 


Incorrectly Classified Instances 0.312 0.092 0.571 0.312 0.404 0.695 
Correctly Classified Instances 0.908 0.688 0.571 0.908 0.834 0.695 
Weighted Average 0.74 0.52 0.714 0.74 0.712 0.695 


Table 6 shows the training section of proposed classifier approaches where different instances are 
checked based on missing or incorrect and correct forms. To measure these instances different element is 
used to check which are recalled, and the receiver operating characteristic (ROC) area-based on all parameter 
the value of the correctly classified instance is better and proved good result by the different classifier during 
the simulation process. Figure 3 present the result of all heuristic classifier that is used in the training and 
testing section of the study. Taking different elements three main sections of the dataset are checked which 
are correct instances and correct, average instance based on the result the different proposed approach 
provides the good result and most of the instance is correct. Table 7 show the result of the classifier in term 
of time. Table 7 present the one of most important parameters of classification which is time consumption 
based on the table SVM and radial basis function network is the best case in term of time and accuracy used. 

Mean absolute error (MAE) section for time measurement in the processing system. This is 
primarily because it takes only a few milliseconds to calculate the accuracy purpose. Figure 4 present the 
time consumption of different machine learning classifier during the processing system and Table 8 present 
the overall time of all classifier taking different data range. 


SS 


TP FP 


Rate Rate ision 
x Incorrectly Classified Instances| 0,312 | 0,092 | 0,571 | 0,312 | 0,404 | 0,695 
a Correctly Classified Instances 0,908 0,688 0,571 0,908 | 0,834 0,695 


I Weighted Average 0,74 0,52 0,714 0,74 0,712 


Figure 3. The result of the classifier 
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Table 7. Time comparison of classifier 


Classifiers Performance Time of construction MAE 
Memory-Based Learner 90.5 0.02 0.044 
Bat 91.2 0.08 0.00019 
ABC 88.3 0.06 0.117 
SVM 99.2 0.02 0.044 
Performance 
Memory-Based 
SVM Learner 


27% 24% 


Figure 4. Time-based comparison of classifier 


Figure 5 and Table 8 present the overall time consumption of the heuristic technique all the model is 
good and improve the performance of every base classifier by providing accurate predictions of heart 
diseases. The overall time is well based on the classifier result so are at the position we can say machine 
learning approaches are good for a medical dataset. 


Table 8. Overall time consumption 


Classifiers Performance Time for construction 
Memory-Based Learner 52.33% 609 ms 
Bat 52% 719 ms 
ABC 45.67% 700 ms 
SVM 97.67% 600 ms 


120,00% 
100,00% 


80,00% 


60,00% 
40,00% 
20,00% 


0,00% * 
Classifiers Performance 


Memory-Based Learner Bat BABC SVM 


Figure 5. Overall time consumption of heuristic algorithms 


9. CONCLUSION 

In this paper, we analyze the accuracy of prediction of heart disease using machine learning 
algorithm classifiers taking a dataset from UCI. The different classifiers of machine learning algorithms are 
exploited for training and testing purposes. The main classifiers used are, support vector machine, bat, ABC, 
and memory-based learner based on these results has achieved an accuracy of SVM 97.90% with 90.96% 
sensitivity and 98.83% specificity the most accurate classifier among all of them. Future research directions 
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include weighted voting-based classifier ensemble and application of the proposed algorithm on different 
diseases like diabetes and cancer for classification and prediction. 
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